Sensitivity analysis of semantic similarity measures

Semantic similarity metrics are often used to compare distantly related biological entities based on their ontological annotations. While semantic similarity measures are widely used, it is unclear as to how sensitive these metrics are to different levels of partial relatedness. In addition to the choice of similarity metrics, there are additional parameters that affect the performance of these measures. Here, we compare a subset of similarity metrics combined with different choices for additional parameters.

Similarity metrics

Profile aggregation approaches

Different approaches to measure Information Content (IC)

Comparison of decay

Here, we compare how the similarity scores from different metrics decay when comparing profiles with decreasing relatedness. A faux database of profiles is created and query profiles of a given size are selected randomly from the database. These query profiles are compared to all profiles in the database and the best match similarity is noted. The query profiles are incrementally decayed by replacing annotations with randomly chosen annotations.

View analysis

Comparison of distinguishing signal from noise

Here, we compare how well different metrics are able to distinguish real similarity from noise similarity.

View analysis


In [ ]: